feat: support stable diffusion v1-5 with qnn#234
feat: support stable diffusion v1-5 with qnn#234ziyuanguo1998 wants to merge 12 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an AITK workflow + supporting scripts/configs to optimize and run Stable Diffusion v1.5 with ONNX Runtime QNN EP (plus supporting CPU/CUDA/OpenVINO paths), including data generation for static quantization and model adaptation/monkey-patching for QNN.
Changes:
- Introduces a full
sd-legacy-stable-diffusion-v1-5/aitkworkflow (configs, optimization/inference scripts, evaluation tooling, sample notebook). - Adds QDQ/QNN pipeline utilities (EP registration, QDQ config shaping, ORT/OpenVINO pipelines, ONNX save/patch helpers).
- Registers the model + dataset in
.aitkconfigs and adds SD-specific requirements.
Reviewed changes
Copilot reviewed 30 out of 31 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| sd-legacy-stable-diffusion-v1-5/olive/winml.py | Helper to enumerate WinML execution provider library paths. |
| sd-legacy-stable-diffusion-v1-5/olive/sd_utils/ort.py | Updates Olive footprint filename lookup for optimized ONNX extraction. |
| sd-legacy-stable-diffusion-v1-5/aitk/winml.py | AITK-side helper to enumerate WinML execution provider library paths. |
| sd-legacy-stable-diffusion-v1-5/aitk/user_script.py | Model loaders, input builders, and dataloaders for Olive passes (incl. LoRA merge + QNN patch hook). |
| sd-legacy-stable-diffusion-v1-5/aitk/stable_diffusion.py | End-to-end CLI for optimizing and running SD v1.5 across providers/formats (incl. QDQ). |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/qdq_xl.py | ORT SDXL pipeline wrappers with data-save hooks (for QDQ data generation). |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/qdq.py | QDQ-specific Olive config shaping + ONNX pipeline wrapper + EP registration + QDQ pipeline loader. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/ov.py | OpenVINO pipeline implementation and Olive config helpers for OV conversion/runtime. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/ort.py | ORT/CUDA optimization helpers, footprint parsing, and pipeline materialization (incl. QNN ctx bin copy). |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/onnx_patch.py | Patched ONNX model wrapper to support saving external weights alongside ONNX artifacts. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_utils/config.py | Shared runtime config values (sample sizes, flags, data dir). |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_qnn_workflow.py | AITK workflow driver orchestrating conversion, data generation, and quantized model generation. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_qnn_workflow.json.config | AITK workflow UI/config template for QNN conversion + quantization + evaluation. |
| sd-legacy-stable-diffusion-v1-5/aitk/sd_qnn_workflow.json | Olive/AITK workflow definition for SD v1.5 QNN target. |
| sd-legacy-stable-diffusion-v1-5/aitk/model_project.config | Registers the workflow in the model project configuration. |
| sd-legacy-stable-diffusion-v1-5/aitk/model_adaptations.py | QNN-focused UNet monkey-patches (attention/activations/norm/proj changes) for compatibility/perf. |
| sd-legacy-stable-diffusion-v1-5/aitk/info.yml | AITK metadata for the SD v1.5 QNN recipe. |
| sd-legacy-stable-diffusion-v1-5/aitk/inference_sample.ipynb | Sample notebook demonstrating QNN EP registration and inference. |
| sd-legacy-stable-diffusion-v1-5/aitk/evaluation.py | Evaluation/data-generation script (CLIP/FID/MSE/HPSv2 hooks, dataset streaming). |
| sd-legacy-stable-diffusion-v1-5/aitk/config_vae_encoder.json | Olive config for VAE encoder conversion/optimization/quantization. |
| sd-legacy-stable-diffusion-v1-5/aitk/config_vae_decoder.json | Olive config for VAE decoder conversion/optimization/quantization (+ optional EP context bin). |
| sd-legacy-stable-diffusion-v1-5/aitk/config_unet.json | Olive config for UNet conversion/optimization/quantization (+ optional EP context bin). |
| sd-legacy-stable-diffusion-v1-5/aitk/config_text_encoder.json | Olive config for text encoder conversion/optimization/quantization (+ surgery/context bin). |
| sd-legacy-stable-diffusion-v1-5/aitk/config_safety_checker.json | Olive config for safety checker conversion/optimization. |
| sd-legacy-stable-diffusion-v1-5/aitk/README.md | Usage documentation for data generation, optimization, and evaluation. |
| sd-legacy-stable-diffusion-v1-5/aitk/.gitignore | Ignores generated caches, artifacts, and results. |
| .aitk/scripts/project_processor.py | Adds HuggingFace icon mapping for the SD v1.5 model family key. |
| .aitk/requirements/requirements-WCR-SD.txt | Adds SD workflow runtime requirements (accelerate/diffusers/torch-fidelity pins). |
| .aitk/configs/model_list.json | Registers SD v1.5 model entry and adds dataset link for phiyodr/coco2017. |
| .aitk/configs/checks.json | Updates check counters to reflect the new model/workflow assets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| optimized_model_dir = script_dir / "models" / "optimized" / model_id | ||
|
|
||
| if common_args.clean_cache: | ||
| shutil.rmtree(common_args.cache_dir, ignore_errors=True) |
There was a problem hiding this comment.
If --clean_cache is provided without --cache_dir, shutil.rmtree(common_args.cache_dir, ...) will raise a TypeError because cache_dir is None. Consider either making --cache_dir required when --clean_cache is set, or defaulting to a known cache path (e.g., script_dir / "cache") and guarding against None.
| shutil.rmtree(common_args.cache_dir, ignore_errors=True) | |
| # If no cache_dir was provided, default to a "cache" directory under script_dir. | |
| cache_dir = common_args.cache_dir or (script_dir / "cache") | |
| if cache_dir is not None: | |
| shutil.rmtree(cache_dir, ignore_errors=True) |
| worker_script = os.path.abspath('winml.py') | ||
| result = subprocess.check_output([sys.executable, worker_script], text=True) | ||
| paths = json.loads(result) |
There was a problem hiding this comment.
worker_script = os.path.abspath('winml.py') depends on the current working directory, so this can fail (or pick up an unintended/malicious winml.py) when the script is launched from another directory. Use a path relative to this module (e.g., Path(__file__).resolve().parents[1] / "winml.py" or similar) to ensure the intended helper is executed.
| if qdq_args.save_data: | ||
| pipeline.save_data_dir = script_dir / qdq_args.data_dir / common_args.prompt | ||
| os.makedirs(pipeline.save_data_dir, exist_ok=True) | ||
| else: |
There was a problem hiding this comment.
pipeline.save_data_dir = script_dir / qdq_args.data_dir / common_args.prompt uses the raw prompt as a directory name. Prompts can contain path separators or characters invalid on Windows, which can break saving or allow writing outside the intended directory. Sanitize/slugify the prompt (or hash it) before using it in a filesystem path.
|
|
||
| ### Test and evaluate | ||
|
|
||
| `python .\evaluation.py --script_dir .\ --model_id stable-diffusion-v1-5/stable-diffusion-v1-5 --num_inference_steps 25 --seed 0 --num_data 100 --guidance_scale 7.5 --provider QNNExecutionProvider --model_dir optimized-qnn_qdq` |
There was a problem hiding this comment.
The README’s evaluation example uses --model_dir optimized-qnn_qdq, but stable_diffusion.py writes optimized models under models/optimized/<model_id> (no provider/format suffix). As written, the evaluation command will look in a directory that is never created; update the README command (or align the output directory naming in code).
| `python .\evaluation.py --script_dir .\ --model_id stable-diffusion-v1-5/stable-diffusion-v1-5 --num_inference_steps 25 --seed 0 --num_data 100 --guidance_scale 7.5 --provider QNNExecutionProvider --model_dir optimized-qnn_qdq` | |
| `python .\evaluation.py --script_dir .\ --model_id stable-diffusion-v1-5/stable-diffusion-v1-5 --num_inference_steps 25 --seed 0 --num_data 100 --guidance_scale 7.5 --provider QNNExecutionProvider --model_dir models/optimized/stable-diffusion-v1-5/stable-diffusion-v1-5` |
| if not common_args.optimize: | ||
| model_dir = unoptimized_model_dir if common_args.test_unoptimized else optimized_model_dir | ||
| with warnings.catch_warnings(): | ||
| warnings.simplefilter("ignore") | ||
| if provider == "openvino": | ||
| from sd_utils.ov import get_ov_pipeline | ||
|
|
||
| pipeline = get_ov_pipeline(common_args, ov_args, optimized_model_dir) | ||
| elif common_args.format == "qdq": | ||
| from sd_utils.qdq import get_qdq_pipeline | ||
|
|
||
| pipeline = get_qdq_pipeline(model_dir, common_args, qdq_args, script_dir) | ||
| else: |
There was a problem hiding this comment.
The QDQ export path uses fixed batch=1 shapes (see dynamic_shape_to_fixed in the configs), but the CLI still allows --batch_size > 1 and passes it through to get_qdq_pipeline. This will likely fail at runtime with shape mismatches. Consider enforcing batch_size==1 when --format qdq (or documenting/handling larger batches).
| elif provider == "qnn" and submodel_name not in ("vae_encoder"): | ||
| config["systems"]["local_system"]["accelerators"][0]["device"] = "npu" | ||
| config["systems"]["local_system"]["accelerators"][0]["execution_providers"] = ["QNNExecutionProvider"] | ||
| config["passes"]["convert"]["target_opset"] = 20 | ||
|
|
||
| # Quantization params | ||
| if submodel_name not in ("text_encoder"): |
There was a problem hiding this comment.
The condition submodel_name not in ("vae_encoder") is using a string instead of a 1-element tuple, so it performs substring membership rather than comparing names. This is brittle and can lead to incorrect branching if submodel_name ever changes; use submodel_name != "vae_encoder" (or not in ("vae_encoder",)) instead. Same issue for the ("text_encoder") check below.
| elif provider == "qnn" and submodel_name not in ("vae_encoder"): | |
| config["systems"]["local_system"]["accelerators"][0]["device"] = "npu" | |
| config["systems"]["local_system"]["accelerators"][0]["execution_providers"] = ["QNNExecutionProvider"] | |
| config["passes"]["convert"]["target_opset"] = 20 | |
| # Quantization params | |
| if submodel_name not in ("text_encoder"): | |
| elif provider == "qnn" and submodel_name != "vae_encoder": | |
| config["systems"]["local_system"]["accelerators"][0]["device"] = "npu" | |
| config["systems"]["local_system"]["accelerators"][0]["execution_providers"] = ["QNNExecutionProvider"] | |
| config["passes"]["convert"]["target_opset"] = 20 | |
| # Quantization params | |
| if submodel_name != "text_encoder": |
| else: | ||
| if "src_height" in meta: | ||
| orig_height, orig_width = meta["src_height"], meta["src_width"] | ||
| image = [cv2.resize(img, (orig_width, orig_width)) for img in image] |
There was a problem hiding this comment.
cv2.resize(img, (orig_width, orig_width)) uses the width value for both dimensions, which will distort images when orig_height != orig_width. This should resize to (orig_width, orig_height) (and note OpenCV expects size as (width, height)).
| image = [cv2.resize(img, (orig_width, orig_width)) for img in image] | |
| image = [cv2.resize(img, (orig_width, orig_height)) for img in image] |
| model (nn.Module): The model in which to replace Attention modules. | ||
|
|
||
| """ | ||
| traverse_and_replace(model, attention_processor.Attention, lambda orig_attn: SHAAttention(orig_attn)) |
There was a problem hiding this comment.
This 'lambda' is just a simple wrapper around a callable object. Use that object directly.
| traverse_and_replace(model, attention_processor.Attention, lambda orig_attn: SHAAttention(orig_attn)) | |
| traverse_and_replace(model, attention_processor.Attention, SHAAttention) |
|
|
||
| try: | ||
| shutil.copyfile(src_path, dst_path) | ||
| except shutil.SameFileError: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| dst_path = Path(save_directory).joinpath(ONNX_EXTERNAL_WEIGHTS_NAME) | ||
| try: | ||
| shutil.copyfile(src_path, dst_path) | ||
| except shutil.SameFileError: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| copy_olive_config(history_folder, config_name, cache_dir, output_dir, activation_type, precision) | ||
|
|
||
| # run stable_diffusion.py to generate onnx unoptimized model | ||
| subprocess.run([sys.executable, "stable_diffusion.py", |
There was a problem hiding this comment.
we had better follow whisper to share original model and skip if exist
|
|
||
| # # run evaluation.py to generate data | ||
| subprocess.run([sys.executable, "evaluation.py", | ||
| "--script_dir", history_folder, |
There was a problem hiding this comment.
same for data to share, so no need to rerun if exist
| @@ -0,0 +1,21 @@ | |||
| import json | |||
There was a problem hiding this comment.
do we need this file? if not, please remove as it is big
No description provided.